skip to main content


Search for: All records

Creators/Authors contains: "Pei, Jian"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Shapley value provides a unique way to fairly assess each player's contribution in a coalition and has enjoyed many applications. However, the exact computation of Shapley value is #P-hard due to the combinatoric nature of Shapley value. Many existing applications of Shapley value are based on Monte-Carlo approximation, which requires a large number of samples and the assessment of utility on many coalitions to reach high quality approximation, and thus is still far from being efficient. Can we achieve an efficient approximation of Shapley value by smartly obtaining samples? In this paper, we treat the sampling approach to Shapley value approximation as a stratified sampling problem. Our main technical contributions are a novel stratification design and two sample allocation methods based on Neyman allocation and empirical Bernstein bound, respectively. Experimental results on several real data sets and synthetic data sets demonstrate the effectiveness and efficiency of our novel stratification design and sampling approaches. 
    more » « less
    Free, publicly-accessible full text available May 26, 2024
  2. A large amount of high-dimensional and heterogeneous data appear in practical applications, which are often published to third parties for data analysis, recommendations, targeted advertising, and reliable predictions. However, publishing these data may disclose personal sensitive information, resulting in an increasing concern on privacy violations. Privacy-preserving data publishing has received considerable attention in recent years. Unfortunately, the differentially private publication of high dimensional data remains a challenging problem. In this paper, we propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases: a Markov-blanket-based attribute clustering phase and an invariant post randomization (PRAM) phase. Specifically, splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable allocation of privacy budget, while a double-perturbation mechanism satisfying local differential privacy facilitates an invariant PRAM to ensure no loss of statistical information and thus significantly preserves data utility. We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy. We conduct extensive experiments on four real-world datasets and the experimental results demonstrate that our mechanism can significantly improve the data utility of the published data while satisfying differential privacy. 
    more » « less
  3. Molecular doping can increase the conductivity of organic semiconductors and plays an increasingly important role in emerging and established plastic electronics applications. 4-(1,3-Dimethyl-2,3-dihydro-1 H -benzimidazol-2-yl)- N , N -dimethylaniline (N-DMBI-H) and tris(pentafluorophenyl)borane (BCF) are established n- and p-dopants, respectively, but neither functions as a simple one-electron redox agent. Molecular hydrogen has been suggested to be a byproduct in several proposed mechanisms for doping using both N-DMBI-H and BCF. In this paper we show for the first time the direct detection of molecular hydrogen in the uncatalysed doping of a variety of polymeric and molecular semiconductors using these dopants. Our results provide insight into the doping mechanism, providing information complementary to that obtained from more commonly applied methods such as optical, electron spin resonance, and electrical measurements. 
    more » « less
  4. null (Ed.)
    Data-driven machine learning has become ubiquitous. A marketplace for machine learning models connects data owners and model buyers, and can dramatically facilitate data-driven machine learning applications. In this paper, we take a formal data marketplace perspective and propose the first en D -to-end mod e l m a rketp l ace with diff e rential p r ivacy ( Dealer ) towards answering the following questions: How to formulate data owners' compensation functions and model buyers' price functions? How can the broker determine prices for a set of models to maximize the revenue with arbitrage-free guarantee, and train a set of models with maximum Shapley coverage given a manufacturing budget to remain competitive ? For the former, we propose compensation function for each data owner based on Shapley value and privacy sensitivity, and price function for each model buyer based on Shapley coverage sensitivity and noise sensitivity. Both privacy sensitivity and noise sensitivity are measured by the level of differential privacy. For the latter, we formulate two optimization problems for model pricing and model training, and propose efficient dynamic programming algorithms. Experiment results on the real chess dataset and synthetic datasets justify the design of Dealer and verify the efficiency and effectiveness of the proposed algorithms. 
    more » « less